Term Weighting Based Indexing Class and Indexing Short Document for Indonesian Thesis Title Classification

نویسندگان

چکیده

Document classification nowadays is an easy thing to do because there are the latest methods get maximum results. using term weighting TF-IDF-ICF method has been widely studied. Documents used in this research generally use large documents. If TF-IDF a short text document such as Thesis Title, will not perfect score from Because IDF calculate weight of words that always appear be few, ICF often class few. While word should have great core document. Therefore, study aims conduct on based indexation and indexation, namely TF-IDF-ICF-IDSF. This uses comparison Naïve Bayes SVM. The dataset Title Informatics Education student at Trunojoyo Madura University. test results show TF-IDF-ICF-IDSF outperform other weighting, getting 91% Precision, 93% Recall, 86% F1-Score, 84% Accuracy

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models

We describe the application of probabilistic indexing and retrieval methods to the TREC material. For document indexing, we apply a description-oriented approach which uses relevance feedback information from previous queries run on the same collection. This method is also very exible w.r.t. the underlying document representation. In our experiments, we consider single words and phrases and use...

متن کامل

Text Document Classification: an Approach Based on Indexing

In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of tex...

متن کامل

Ontology-Based Framework for Document Indexing

The work presented in this paper addresses a project of the Computer Centre CIRTIL who supported it. This company wants to save and capitalize its knowledge and its know-how about the production activities, especially concerning the technical incidents relating to software applications encountered during the exploitation time. Indeed using a well accessing documents base, actors will be able to...

متن کامل

Document indexing: a concept-based approach to term weight estimation

Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, ...

متن کامل

Using Term Co-occurrence Data for Document Indexing and Retrieval*

In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of do...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: J-Cosine (Journal of Computer Science and Informatics Engineering)

سال: 2022

ISSN: ['2541-0806']

DOI: https://doi.org/10.29303/jcosine.v6i2.471